AUTOMATED DERIVATION OF THE ADJOINT OF HIGH-LEVEL 
TRANSIENT FINITE ELEMENT PROGRAMS 
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Abstract. In this paper we demonstrate the capability of automatically deriving the discrete ad- 
joint and tangent linear models from a forward model written in the high-level FEniCS finite element 
computing environment. In contrast to developing a model directly in Fortran or C++, high-level 
systems allow the developer to express the variational problems to be solved in near-mathematical 
notation. As such, these systems have a key advantage: since the mathematical structure of the prob- 
lem is preserved, they are more amenable to automated analysis and manipulation. Our approach to 
automated adjoint derivation relies on run-time annotation of the temporal structure of the model, 
and employs the same finite element form compiler to automatically generate the low-level code for 
the derived models. The approach requires only trivial changes to a large class of forward models, 
including complicated time-dependent nonlinear models. The adjoint model automatically employs 
optimal checkpointing schemes to mitigate storage requirements for nonlinear models, without any 
user management or intervention. Furthermore, both the tangent linear and adjoint models natu- 
rally work in parallel, without any need to differentiate through calls to MPI or to parse OpenMP 
directives. The generality and applicability of the approach is demonstrated with examples from a 
wide range of scientific applications. 
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1. Introduction. Adjoint models are key ingredients in many algorithms of 
computational science, such as in parameter identification |38j . sensitivity analysis [5], 
data assimilation [26] . optimal control [28], adaptive observations [40], predictability 
analysis [33], and error estimation 3 . While deriving the adjoint model associated 
with a linear stationary forward model is straightforward, the development and im- 
plementation of adjoint models for nonlinear or time-dependent forward models is 
notoriously difficult, for several reasons. First, each nonlinear operator of the forward 
model must be differentiated, which can be difficult for complex models. Second, the 
control flow of the adjoint model runs backwards, from the final time to the initial 
time, and requires access to the solution variables computed during the forward run 
if the forward problem is nonlinear. Since it is generally impractical for physically 
relevant simulations to store all variables during the forward run, the adjoint model 
developer must implement some checkpointing scheme that balances recomputation 
and storage |15j . The control flow of such a checkpointing scheme must alternate 
between the solution of forward variables and adjoint variables, and is thus highly 
nontrivial to implement by hand on a large and complex code. For parallel compu- 
tations, these difficulties are magnified by the fact that the control flow of parallel 
communications reverses in the adjoint solve: forward sends become adjoint receives, 
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Fig. 1.1: The traditional approach to developing adjoint models. The forward model 
is implemented by hand, and its adjoint derived either by hand or with the assistance 
of an algorithmic differentiation tool. 



and forward receives become adjoint sends |44j . 

The traditional approach to model development is to implement the forward code 
by hand in a low-level language (typically Fortran or C++). While this allows the 
programmer a high degree of control over each memory access and floating point 
operation, implementing these codes usually takes a large amount of time, and the 
mathematical structure of the problem to be solved is irretrievably interwoven with 
implementation details of how the solution is to be achieved. Then the adjoint code is 
produced, either by hand or with the assistance of an algorithmic differentiation (AD) 
tool (figure [O] ). Such AD tools take as input a forward model written in a low- level 
language, and derive the associated discrete adjoint model, through some combination 
of source-to-source transformations and operator overloading. However, this process 
requires expert knowledge of both the tool and the model to be differentiated [37J Pg- 
xii\. The root cause of the difficulty which AD tools have is that they operate on 
low-level code in which implementation details and mathematics are inseparable and 
therefore must both be differentiated: AD tools must concern themselves with matters 
such as memory allocations, pointer analyses, I/O, and parallel communications (e.g. 
MPI or OpcnMP). 

A variant of this approach is to selectively apply AD to small sections of the model, 
and then to connect and arrange these differentiated routines by hand to assemble the 
discrete adjoint equations [13l[7j[36]. This approach attempts to re-introduce as much 
as possible of the distinction between mathematics and implementation; however, it 
requires even more expertise than a naive black-box application of AD. 

Recently, the first three authors proposed a new, higher-level abstraction for de- 
veloping discrete adjoint models [TU]. While algorithmic differentiation treats models 
as a sequence of elementary instructions, this new abstraction treats the model as a 
sequence of equation solves. This offers an alternative approach to the development of 
discrete adjoint models, and is implemented in an open-source software library called 
libadjoint. 

When libadjoint is applied to a low-level forward code, the developer must an- 
notate the forward model. This involves embedding calls to the libadjoint library 
that record the temporal structure of the equations as they are solved. The recorded 
information is analogous to a tape in AD, but at a higher level of abstraction. Us- 
ing this information, libadjoint can symbolically manipulate the annotated system 
to derive the structure of the adjoint equations or the tangent linear equations. If 
the adjoint developer further supplies callback functions for each operator that fea- 
tures in the annotation and any necessary derivatives, libadjoint can assemble each 
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adjoint or tangent linear equation as required. The library therefore relieves the de- 
veloper of deriving the adjoint equations, managing the complex life cycles of forward 
and adjoint variables, and implementing a checkpointing scheme. With this strategy, 
the task of developing the adjoint model (which requires significant expertise) is re- 
placed with the tasks of describing and modularising it (which are usually much more 
straightforward) . 

The aim of this work is to apply libadjoint to automatically derive the adjoint 
of models written in a high-level finite element system. In such systems, the discrete 
variational formulation is expressed in code which closely mimics mathematical nota- 
tion. The low-level details of finite element assembly and numerical linear algebra are 
delegated to the system itself. For example, in the Sundance C++ library, the assem- 
bly is achieved by runtime Frechet differentiation of the specified variational form [33 . 
In the FEniCS environment, the variational form is passed to a dedicated finite ele- 
ment form compiler, which generates low-level code for its assembly [2H G2J 132] ■ By 
exploiting optimisations that are impractical to perform by hand, such systems can 
generate very efficient implementations |251 1391 133] . Moreover, a major advantage of 
this clean separation between mathematical intention and computer implementation 
is that it enables the automatic mathematical analysis and manipulation of the vari- 
ational form. The absence of this separation in low-level models inhibits automated 
analysis at the level of variational forms. In addition, high-level systems can often pro- 
vide automated variational form differentiation/linearisation capabilities [Tl I3"3l BT| . 
which can be of particular interest for adjoint models. 

The main contribution of this paper is a new framework for the automated deriva- 
tion of the discrete adjoint and tangent linear models of forward models implemented 
in the FEniCS software environment. The strategy is illustrated in figure [T72] In FEn- 
iCS model code, each equation solve may be naturally expressed as a single function 
call. This matches the basic abstraction of libadjoint exactly, and therefore makes 
the integration of FEniCS and libadjoint particularly straightforward. Consequently, 
for a large class of forward models implemented in FEniCS, libadjoint can automat- 
ically derive the discrete adjoint model with only minor additions to the code. The 
necessary derivative terms are automatically computed using the form linearisation 
capabilities of UFL pQ. The adjoint equations derived by libadjoint are themselves 
valid FEniCS input, and the low-level adjoint code is generated using the same finite 
element form compiler as the forward model. As a result, the derived adjoint mod- 
els approach optimal efficiency, automatically employ optimal checkpointing schemes, 
and inherit parallel support from the forward models. 

This paper is organised as follows. In section [2j we describe the fundamental 
abstraction underlying libadjoint and give a brief review of libadjoint. In section [3j we 
outline the implementation of transient finite element forward models in the FEniCS 
framework. The integration of FEniCS and libadjoint is implemented in a software 
framework called dolfin-adjoint, which is described in section [4] The advantages and 
limitations of the approach are discussed in section [5j Numerical examples drawn 
from a wide range of scientific applications are presented in section [6j before we make 
some concluding remarks in section [7j 

2. The fundamental abstraction of libadjoint. In this section, we detail 
the basic abstraction upon which libadjoint is based. This abstraction is to treat the 
model as a sequence of equation solves. As explained in |10j . this abstraction applies 
to both stationary and time-dependent systems of partial differential equations, and 
to both linear and nonlinear systems. 
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Fig. 1.2: The approach to adjoint model development advocated in this paper. The 
user specifies the discrete forward equations in a high-level language similar to mathe- 
matical notation; the discrete forward equations are explicitly represented in memory 
in the UFL format [I], libadjoint automatically derives the corresponding in-memory 
representation of the discrete adjoint equations, from the in-memory representation 
of the forward problem. Both the forward and adjoint equations are then passed to 
the FEniCS system, which automatically generates and executes the code necessary 
to compute the forward and adjoint solutions. 



2.1. Mathematical framework. We consider systems of discretised partial dif- 
ferential equations expressed in the fundamental abstract form 

A(u)u = b(u), (2.1) 

where u is the vector of all prognostic variables, b(u) is the source term, and A(u) 
the entire discretisation matrix. In the time-dependent case, u is a block-structured 
vector containing all the values of the unknowns at all the time levels, A is a matrix 
with a lower-triangular block structure containing all of the operators featuring in the 
forward model, and 6 is a block-structured vector containing all of the right-hand side 
terms for all of the equations solved in the forward model. The block-lower-triangular 
structure of A is a consequence of the forward propagation of information through 
time: later values depend on earlier values, but not vice versa. 



It is to be emphasised that writing the model in this format (2.1 ) does not imply 
that the whole of A is ever assembled at once, or the whole of u stored in memory. 
For instance, the forward solver will typically assemble one block-row of A, solve it 
for a block- component of u, forget as much as possible, and step forward in time. 

Let m be some parameter upon which the forward equations depend. For example, 
to could be a boundary condition, initial condition, or coefficient appearing in the 



equations. The tangent linear model associated with (2.1 1 is then given by 



where 



(A + G-R)^ = -f, (2.2) 
am am 



r)A 

G=~u, (2.3) 
ou 



and 



F = A(u)u-b(u). 



(2.5) 



THE ADJOINT OF HIGH-LEVEL FINITE ELEMENT PROGRAMS 



■5 



The unknown in ( 2.2 ) is the du/dm matrix, the Jacobian of the solution u with respect 
to the parameters m. 

Let J be some functional of the solution u. J is a function that takes in the 
system state, and returns a single scalar diagnostic. For example, in aeronautical 
design, J may be the drag coefficient associated with a wing; in meteorology, J may 
be the weighted misfit between observations of the atmosphere and model results. 



The adjoint model associated with (2.1| is given by 



(A + G-R)*z = 



5J 

du ' 



(2.6) 



where z is the adjoint solution associated with J [lOj . 

To make matters more concrete, consider the following example. Suppose the 
forward model approximately solves the time-dependent viscous Burgers' equation 



du 
dt 



u-Vu- V 2 u = /, 



(2.7) 



for the velocity u, subject to some suitable boundary conditions and a supplied initial 
condition u(0) = g, with source term /. Discretising with the Galerkin finite element 
method in space and the forward Euler method in time, and linearising the nonlin- 
ear advective term around the solution of the previous timestep yields the timestep 
iteration 



u 

Mu n+ i 



-(M - AtV (u n ) - AtD)u n + Atf n 



(2.8) 



where n is the timelevel, At is the timestep, M is the mass matrix, D is the diffusion 
matrix, V(u) is the advection matrix assembled at a given velocity u. For brevity, 
define 



T(-) = AtV(-) + AtD - M. 



(2.9) 



As described in [TU], (2.8| can be cast into the form of (2.1) by writing N timestep 
iterations as 



/ I 

T(u ) M 

T(ui) M 



V 



Ui 
"2 



T{u N _i) MJ \u n J 



( 9 \ 

At/o 
At/x 

\Atf N _J 



(2.10) 



Using (2.6 1, the adjoint system is given by 

m* (r( Ul ) + 



Zl 



M*J \z N J 



dJ_ 
du ' 



(2.11) 



() 
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The adjoint system reverses the temporal flow of information: where the forward 
and tangent linear models are block-lower-triangular, the adjoint model is block- 



upper-triangular, as visible in (2.11 ). The adjoint system is therefore typically solved 
by backward substitution: the adjoint variable associated with the end of time is 
solved for first, and then the solution proceeds backwards in time. 



At its heart, libadjoint takes models cast in the form of (2.1 1, and derives, as- 



sembles and solves the tangent linear (2.2) and adjoint (2.6) systems block-row by 
block-row. 

2.2. Libadjoint. Applying libadjoint to a model breaks down into two main 
tasks. These are described in detail in [10] and briefly outlined here. The automation 
of these two tasks by dolfin-adjoint is described in section [4j 

The first task, referred to as annotation, is to describe the forward code in the 



form of the fundamental abstraction (2.1 1. By describing the forward code in this 



form, libadjoint automates the reasoning necessary to derive the adjoint (2.6| and 



tangent linear (2.2) systems. As each equation is solved, the necessary semantic 
information about that equation is recorded. In particular, each equation records the 
variable solved for, the operators (matrices) that feature in the equation, and any 
dependencies these operators have on previous variables. This annotation is effected 
by making calls to library functions offered by the libadjoint API. In a low-level 
code, these calls must be inserted manually by the model developer, as the high- 
level semantic structure will have been obscured in the process of implementing the 
model. The annotation enables libadjoint to symbolically derive the structure of the 
discrete adjoint and tangent linear systems; however, without further information it 
cannot assemble the actual adjoint or tangent linear equations. The operators in the 
annotation are mere abstract handles. 

The second task is to supply libadjoint with function callbacks for the operators 
that feature in the annotation. If these operators depend on previously computed 
variables, their derivatives must also be supplied; the code for these derivative call- 
backs may be written by hand, or may be generated with an AD tool. With these 
callbacks, the derived equations may be automatically assembled. By modularising 
the forward model in this manner, libadjoint can drive the assembly of the forward, 
tangent linear and adjoint models, by calling the appropriate callbacks in the correct 
sequence. In a low-level code, these callbacks must be written manually. This can be 
a significant burden if the original forward code is poorly modularised. 

The use of libadjoint offers several advantages. It can be applied to models for 
which black-box AD is intractable [TUj, and gains the speed and efficiency benefits 
of applying AD judiciously [121 [7J Using libadjoint makes development system- 
atic: each incremental step in its application may be rigorously verified. As libadjoint 
internally derives a symbolic representation of the discrete adjoint equations, it can 
compute when a forward or adjoint variable is no longer necessary, and thus the 
model developer is relieved of the management of variable deallocation. Furthermore, 
libadjoint can automatically check the consistency of the adjoint computed with the 
original forward model; this check greatly improves the maintainability of adjoint 
codes, as developers can be immediately notified when a change to the forward model 
is not mirrored in the adjoint model. Finally, as libadjoint has sufficient information to 
re-assemble the forward equations, it is possible to implement checkpointing schemes 
entirely within the library itself. Checkpointing is a crucial feature for the efficient 
implementation of the adjoint of a time-dependent nonlinear model, but its imple- 
mentation can be prohibitively difficult. The availability of optimal checkpointing 



THE ADJOINT OF HIGH-LEVEL FINITE ELEMENT PROGRAMS 



7 



DOLFIN function signature Short description 

solve (Ins == rhs, u, bcs) Solve variational problem 

u.assign(u_) Copy function u_ to u 

assemble (a) Assemble variational form 

be. apply (A) Apply a boundary condition to a matrix 

solve (A, x, b) Solve linear system 

project (u, V) Project u onto the function space V 

Table 3.1: Important DOLFIN statements relevant to dolfin-adjoint. 



schemes within libadjoint is a significant advantage. 

3. The FEniCS system. The FEniCS Project is a collection of software compo- 
nents for automating the solution of differential equations [3 29]. These components 
include the Unified Form Language (UFL) pQ, the FEniCS Form Compiler (FFC) [23] . 
and DOLFIN [3T1 [32] . In the following, we only briefly outline the FEniCS pipeline, 
and refer the reader to the aforementioned references for more information. 

One of the key features of the FEniCS components is the use of code genera- 
tion, and in particular domain-specific code generation for finite element variational 
formulations: the user specifies the discrete variational problem to be solved in the 
domain-specific language UFL, the syntax of which mimics and encodes the mathe- 
matical formulation of the problem. Based on this high-level formulation, a special- 
purpose finite element form compiler generates optimised low-level C++ code for the 
evaluation of local clement tensors. The generated code is then used by DOLFIN 
to perform the global assembly and numerical solution. DOLFIN also provides the 
underlying data structures such as meshes, function spaces, boundary conditions and 
function values. DOLFIN provides both a C++ and a Python interface. For the 
C++ interface, the UFL specification and the form compilation must take place off- 
line, and the generated code is explicitly included by the user. In the Python interface, 
the functionality is seamlessly integrated by way of runtime just-in-time compilation. 
The approach taken by dolfin-adjoint is only applicable to models written using the 
Python interface. 

DOLFIN abstracts the spatial discretisation problem, but not the temporal dis- 
cretisation problem. Transient DOLFIN-based solvers typically consist of a hand- 
written temporal loop in which one or more discrete variational (finite element) prob- 
lems are solved with DOLFIN in each iteration. An overview of common operations 
relevant to dolfin-adjoint is given in table |3.1| 

At the highest level of abstraction, a DOLFIN model developer can define the 
variational problem for each timestep in terms of the unknown function, the left and 
right hand side forms and boundary conditions, and then call the DOLFIN solve 
function. If the variational problem is linear (represented by a left-hand side bilinear 
form, and a right-hand side linear form, a == L), the linear system of equations is 
assembled and solved under the hood. If the variational problem is nonlinear (repre- 
sented by a left-hand side rank-1 form and zero right-hand side, F == 0), a Newton 
iteration is invoked in which the Jacobian of the variational form F is derived auto- 
matically, and the linear system in each Newton iteration assembled and solved until 
the iteration has converged. The derivation of the Jacobian employs the algorithmic 
differentiation algorithms of UFL: because the form is represented in a high-level ab- 
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straction, the symbolic differentiation of the form is straightforward [U §17.5.2, §17.7]. 
When moving from one timestep to the next, the assign function is typically used 
to update the previous function with the new value. A sample solver demonstrating 



this type of usage, solving the nonlinear Burgers' equation, is listed in figure 4.1 



However, DOLFIN also supports more prescriptive programming models, in which 
explicit calls are employed to assemble matrices and solve the resulting linear systems. 
If the variational problem is linear and the left-hand side is constant, the assembly 
of the stiffness matrix may occur outside the temporal loop, and the matrix reused. 
The solution of the linear systems may also be further controlled by specifying direct 
LU or Krylov solvers, or even matrix-free solvers. 

DOLFIN's abstraction of a transient problem as an explicit sequence of variational 
problems exactly matches the fundamental abstraction of libadjoint; this matching of 
abstractions is the basis of the work presented here. 



from do If in import * 

from dolfin.adjoint import * 

n = 30 

mesh = Unitlnterval (n) 
V = FunctionSpace(mesh, "CG" , 2) 
ic = project ( Expression ( " sin (2* pi*x [0] )") , V) 
u = Function (ic , name=" Velocity " ) 
u_next = Function (V, name=" Next Velocity " ) 
v = TestFunction (V) 
nu = Constant (0.0001) 
timestep = Constant ( 1 . 0/ n) 
F = ((u_next — u) / timestep *v 

+ u.ncxt *grad ( u_next ) * v + nu*grad ( u_next ) * grad ( v ) ) * dx 
be = DirichletBC(V, 0.0, " on.boundary " ) 
t = 0.0; end = 0.2 
while ( t <= end ) : 

solve (F = 0, u_next , be) 

u.assign(u_next) 

t += float ( timestep ) 

adj _inc_timestep () 



Fig. 4.1: DOLFIN code for a simple discretisation of the Burgers equation (2.7 1 
with dolfin-adjoint annotations. The dolfin-adjoint module overloads the existing 
solve and assign functions, and allows the user to specify names of Functions 
for convenience. The only change to the code body is the introduction of a call to 
adj_inc_timestep to indicate to libadjoint that a new timestep is commencing. 



4. Applying libadjoint to DOLFIN. In this section, we present the internal 
details of how dolfin-adjoint integrates DOLFIN with libadjoint. This includes the 
annotation of the forward model execution, the recording of any necessary values, and 
the generation and registration of callback functions. All of these processes happen 
automatically, without any intervention by the model developer. 
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4.1. Annotations. The basic mechanism of automatically annotating DOLFIN 
models employed here is to overload the DOLFIN functions that change the values of 
variables. The overloaded versions annotate the event and then pass control to the 



original DOLFIN functions. All of the functions listed in table 3.1 are annotated 



Figure 4.1 presents a Burgers' equation model modified for use with dolfin-adjoint. 
The modifications and overloaded functions are highlighted. This demonstrates that 
only minimal source changes are required. 

In a low-level model, the information that libadjoint needs to record is not explic- 
itly represented as data in the code; therefore, annotation has to be at least partially 
done by hand, as the programmer must supply the information instead. By contrast, 
in the Python interface to DOLFIN, the equation to be assembled is explicitly repre- 
sented as data at runtime; DOLFIN's solve function takes as input the variational 
form of the equation to be solved, and so all information necessary for the annotation 
is available during the solve function. This makes the automatic annotation possible. 
By contrast, when using the C++ interface to DOLFIN, the code generation happens 
offline, and the equations are not explicitly represented as data at runtime; it is for 
this reason that the automated runtime derivation of the adjoint model in the manner 
described here is not possible. Therefore, dolfin-adjoint only supports the use of the 
Python interface to DOLFIN. 

When called, the adjoint-aware solve function inspects the left- and right-hand 
sides of the provided equation to determine the information needed by libadjoint. This 
includes the variable being solved for, the operators which feature in the equation, 
and their dependencies on values that were previously computed. UFL supports 
the interrogation of forms to automatically extract the necessary information. This 
information is then registered with libadjoint using the relevant API calls. When 
an overloaded call completes, dolfin-adjoint will record the resulting value if this is 
required for later use in the adjoint computation. 

For nonlinear solves expressed in the form F(u) — 0, where F is a rank-1 (vector) 
form, the adjoint code annotates the equivalent equation 

Mu = Mu-F(u), (4.1) 

where M is the mass matrix associated with the function space of u, so that it can be 



cast into the form of a row of (2.1 ). While it would be possible to annotate each linear 
solve in the Newton iteration, this would be inefficient: with this method, the adjoint 
run would have to rewind through each iteration of the nonlinear solver, whereas by 



annotating in the manner of (4.1 ), only one adjoint solve is necessary. This is akin to 
the method suggested in |12) , which computes the derivative of a Newton iteration in 
one step by linearising about the computed forward solution. 

In the cases where a developer of DOLFIN models pre-assembles forms into ten- 
sors using the assemble function, only low-level matrices and vectors are given as 
inputs to the solve call; the semantic information about the forms is no longer avail- 
able. This problem is resolved by supplying an overloaded assemble function which 
associates the form to be assembled with the assembled tensor. When the matrix- 
vector version of solve is called, it uses this association to recover the forms involved 
in the equation, and annotates as described above. 

The correctness of the adjoint relics on the correctness of the annotation. If the 
annotation does not exactly record the structure of the forward model, the gradient 
computed using the adjoint will be inconsistent. For example, the annotation would 
become inconsistent if the model accesses the underlying . vector () of a Function 



10 



P. E. FARRELL ET AL. 



and changes the values by hand; this would not be recorded by libadjoint. This 
restriction is mitigated somewhat by the fact that the replay feature of libadjoint can 
be used to check the correctness of the annotation. As will be discussed in section [O] 
libadjoint has sufficient information to replay the forward equations. This can be used 
to automatically check the correctness of the annotation, by replaying the annotation 
and comparing each value to that computed during the original model run. 

4.2. Callbacks. In the dolfin-adjoint code, the registration of callbacks occurs at 
the same time as the annotation, in the overloaded function calls. For each operator in 
the equations registered, Python functions are generated by dolfin-adjoint at runtime, 
and these are associated with the operator through the relevant libadjoint API calls. 
Depending on the precise details of the operator, libadjoint's requirements vary: if the 



operator is on the main diagonal of A in (2.1 ), the function returns the form itself (or 



its transpose, in the adjoint case), while if the operator is not on the main diagonal, 
the function returns the action of the form on a given input vector (or its transpose 
action). In either case, if the form depends on previously computed solutions, the 
derivative of the form with respect to these variables must be generated, along with 
the associated transposes. For the computation of these transposes and derivatives, 
we rely on the relevant features of UFL in which the form is expressed [H §17.5.2, 
§17.7].' 

The function callbacks also take references to information other than the form, 
so that the exact conditions of the forward solve can be recovered by dolfin-adjoint 
as necessary. In particular, time-dependent boundary conditions and forcing terms 
are implemented in DOLFIN via Expression classes which take in the current time 
as a parameter. Before a function is defined, the current value of every parameter 
is recorded, and the function then includes this record as part of its lexical closure. 
When the function is called, it restores each parameter value to the value it had 
when it was created. Similarly, functions take references to any Dirichlet boundary 
conditions that are applied to the equation as part of their lexical closure, so that 
the associated homogeneous Dirichlet boundary conditions may be applied to the 
corresponding adjoint or tangent linear equations. 

Again, because of the fact that the equation to be solved is represented as data at 
runtime, the definition of the callbacks can happen entirely automatically. Following 
the registration of these callbacks, libadjoint can compose the appropriate terms at 



will to assemble the adjoint or tangent linear system corresponding to (2.1 1 



4.3. The dolfin-adjoint user interface. The highest-level interface is the 
function compute_gradient: given a Functional J and a Parameter m, it computes 



the gradient dJ/dm using the adjoint solution. Example usage is given in figure 4.2 

If the user wishes to directly access the adjoint or tangent linear solutions, the 
functions compute .adjoint and compute_tlm are available. These functions iter- 
ate over the adjoint and tangent linear solutions, with the tangent linear solutions 
advancing in time and the adjoint solutions going backwards. A list of important 
dolfin-adjoint statements is given in table [4~T| 

5. Discussion. 

5.1. Checkpointing. The adjoint and tangent linear models are linearisations 
of the forward model. If the forward model is nonlinear, then the solution computed 
by the forward model must be available during the execution of the linearised models: 
the adjoint and tangent linear models depend on the forward solution. In the tangent 
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J = FinalFunctional ( . 5 * inner (u , u)*dx) 

ic_param = InitialConditionParameter (" Velocity " ) 

dJdic = compute_gradient ( J , ic_param) 

print norm(dJdic) 

plot(dJdic, interactive=True) 

Fig. 4.2: Sample dolfin-adjoint user code complementing the Burgers model presented 
in figure |4.1| the adjoint is generated and used to compute the gradient of the func- 
tional J = | u(T) ■ u(T) dx with respect to the initial condition u(0). 



dolfin-adjoint statement Short description 

J = FinalFunctional (0 . 5*inner (u, u)*dx) Functional J =\L u{T) ■ u{T) dx 

J = TimeFunctional(0 . 5*inner (u, u)*dx) Functional J = g J t Jq u(t) ■ u(t) dx dt 
param = InitialConditionParameter ("Velocity") Control variable for the initial velocity 

compute_gradient ( J , param) Computes the gradient dJ/du 

compute_adjoint (J) Generator for the adjoint solutions 

compute_tlm (param) Generator for the tangent linear solutions 



Table 4.1: Important dolfin-adjoint statements. 



linear case, this is not a major burden: the tangent linear system is block-lower- 
triangular, like the nonlinear forward model, and so each tangent linear equation can 
be solved immediately after the associated forward equation, with no extra storage of 
the forward solutions necessary. However, as the adjoint system is solved backwards 
in time, the forward solution (or the ability to recompute it) must be available for the 
entire length of the forward and adjoint solves. 

In large simulations, it quickly becomes impractical to store the entire forward 
solution through time at once. The alternative to storing the forward solutions is to 
recompute them when necessary from checkpoints stored during the forward run; how- 
ever, a naive recomputation scheme would greatly increase the computational burden 
of the problem to be solved. Therefore, some balance of storage and recomputation 
is necessary. This problem has been extensively studied in the algorithmic differen- 
tiation literature [TSl HH US US] and several algorithmic differentiation tools support 
the automatic implementation of such a checkpointing scheme. Of particular note is 
the algorithm of Griewank et al., which achieves logarithmic growth of the storage 
requirements and recomputation costs [151 116j and is provably optimal for the case in 
which the number of timesteps to be performed is known in advance |17j . 

Despite these theoretical advances, implementing a checkpointing scheme by hand 
is a major challenge. The complexity of programming them inhibits their widespread 
use. Many adjoint models do not implement checkpointing schemes |49j . or implement 
suboptimal checkpointing schemes. In order to implement a checkpointing scheme, 
the model control flow must jump between adjoint solves and forward solves, which 
is difficult to achieve if the model has a large or complex state. Instead, models may 
rely on temporal interpolation schemes to approximate the forward state. This choice 
introduces approximation errors into the adjoint equations, and the adjoint will no 
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longer supply a gradient consistent with the discrete forward model |47j . 

The main function of the annotation is to enable libadjoint to derive the associated 
adjoint and tangent linear models. However, libadjoint can also use the annotation 
without manipulation to assemble and solve the original forward model. While this 
may not be of direct utility to the model developer, the ability to replay the forward 
model means that checkpointing schemes may be implemented entirely within libad- 
joint itself [TO]. When the developer requests the assembly of an adjoint equation 
from libadjoint, libadjoint will detect any dependencies of that adjoint equation that 
are not available, and will automatically recompute the relevant forward solutions 
from the appropriate checkpoint, without any further intervention by the model de- 
veloper, libadjoint uses the revolve library |16| to identify the optimal placement of 
checkpoints, subject to the specified disk and memory storage limits. If the number 
of timesteps is known a priori, libadjoint uses revolve's optimal offline checkpointing 
algorithm 15 ; if this is not possible, libadjoint uses revolve's online checkpointing 
algorithm |46J. 

Once these basic checkpointing parameters have been specified, the only change 
required to the model code is to add a call to the adj _inc_timestep function at the end 
of the forward time loop to indicate that the model timestep has been incremented; 
libadjoint's use of revolve is fundamentally based on the timesteps of the model, 
but DOLFIN currently has no native concept of time or timestepping. From the 
perspective of the model developer, the fact that the checkpointing algorithm can be 
implemented entirely within libadjoint is attractive, as it allows for the adjoints of 
long forward runs to be achieved with optimal storage and recomputation costs for 
almost no extra developer effort. 

5.2. Parallelism. The algorithmic differentiation of parallel programs imple- 
mented using OpenMP or MPI is a major research challenge I s TTUTT]. Even when 
using an algorithmic differentiation tool to generate the majority of the adjoint code, 
this challenge means that the adjoints of the communication routines are usually 
written by hand. This was the strategy used by the parallel adjoint MITgcm ocean 
model PUH3]. 

In the libadjoint context, the annotation is orthogonal to the parallel implemen- 
tation of the model: the fact that the matrices and solutions happen to be distributed 
over multiple processing units is independent of the dependency structure of the equa- 
tions to be solved. However, the callbacks supplied by the developer must be parallel- 
aware: for example, the action callback of an operator must call the relevant halo 
update routine, and the transpose action must call the adjoint of the halo update 
routine. By itself, libadjoint does not remove the need for the adjoint of the parallel 
communication routines; the developer must still reverse the information flow of the 
communications to implement the transpose action callbacks. 

By contrast, DOLFIN handles all of these parallel communication patterns au- 
tomatically, even in the adjoint case. The high-level input to DOLFIN contains no 
parallel communication calls: DOLFIN derives the correct communication patterns 
at runtime. The adjoint equations to be solved are represented in the same UFL 
format as the forward model, and are passed to the same DOLFIN runtime system. 
The necessary parallel communication patterns for the adjoint equations are there- 
fore automatically derived in exactly the same way as the parallel communication 
patterns for the forward equations, for both the MPI and OpenMP cases [30] §6.4]. 
This circumvents the need to adjoin the parallel communication calls. Indeed, there 
is no parallel-specific code in dolfin-adjoint: by operating at this high level of ab- 
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straction, the problem of the reversal of communication patterns in the adjoint model 
simply vanishes. The remarkably straightforward implementation of the parallel ad- 
joint model is a major advantage of adopting the combination of a high-level finite 
element system and a high-level approach to deriving its adjoint. 



5.3. Matrix-free models. In some scientific applications, the problems to be 
solved are so large that it is not practical to explicitly store the matrix associated 



with a single block-equation in the sense of (2.1 1. For example, such problems arise 
in the Stokes equations for mantle convection [9] and the Boltzmann transport equa- 
tions for radiation transport |42j . In such situations, matrix- free solution algorithms, 
algorithms that never demand the whole matrix at once, are used instead. Instead of 
assembling a matrix M and passing it to the linear solver routine, a function / that 
computes the action Mv of the matrix on a given vector v is supplied. 

In general, automatically differentiating programs that use matrix-free solvers is 
difficult. If an algorithmic differentiation tool were to be applied in a naive black- 
box fashion, the analysis of the tool must trace the flow of information from the 
independent variables through the registration of the action function pointer into 
the inner loop of the matrix-free linear solver algorithm, and differentiate backwards 
through this chain. In the typical case, the linear solver algorithm is taken from 
an external library, which may be written in a different programming language, and 
for which the source may not be immediately available. Due to this difficulty, the 
only previous research on algorithmically differentiating matrix-free solvers of which 
the authors are aware has been limited to very specific interfaces in PETSc that are 
confined to structured meshes [23] . 

However, the fact that the model takes a matrix-free approach is ultimately an 
implementation detail: it is a choice made in how the discrete problem is to be solved, 
but it does not change the discrete problem or its adjoint. The difficulties faced by 
applying an algorithmic differentiation tool to models that use matrix-free solvers stem 
from the fact that the tool must untangle the mathematical structure of the problem 
from the details of how its solution is to be achieved; in the matrix-free case, this 
untangling is particularly difficult. However, with the high-level abstraction adopted 
in this work, such problems disappear: by operating at the level of equation solves, 
automatically deriving the discrete adjoint proceeds in exactly the same manner, and 
it does not matter whether those equation solves happen to assemble matrices or to 
use matrix-free solvers. 



6. Examples. In this section, we present three numerical examples drawn from a 
range of scientific applications, illustrating different discretisations and solution strate- 
gies. These examples were run with DOLFIN 1.0, libadjoint 0.9, and dolnn-adjoint 0.6. 
The software is freely available from http://fenicsproject.org/applications. 



6.1. Cahn-Hilliard. As an initial example, the Cahn-Hilliard solver from the 
DOLFIN examples collection was adjoined using the techniques described above. The 
Cahn-Hilliard equation is a nonlinear parabolic fourth-order partial differential cqua- 
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tion used to describe the separation of two components of a binary fluid [6]: 

dc V-M (v(f-e 2 \7 2 c)) =0 mil, (6.1) 



M ( V ( ? - e2v2c ) ) = on ^ 



9£ \ \ dc 

'dj_ 
dc 

Me 2 Vc ■ n = onffl, 
c(< = 0) — c on Q, 

where c is the unknown concentration field, M and e are scalar parameters, n is the 
unit normal, cq is the given initial condition, and / = 100c 2 (l — c) 2 . 

In order to solve the problem using a standard Lagrange finite element basis, the 
fourth-order PDE is separated into two coupled second-order equations: 

dc 

— - V ■ MV/i = in il, (6.2) 

M-^f+e 2 V 2 c = infl, (6.3) 

ac 

where fi is an additional prognostic variable. To solve the problem, the equations 
are cast into variational form and discretised with linear finite elements, and Crank- 
Nicolson timestepping applied [8]. 

The example code was slightly modified for use with dolfin-adjoint. The sub- 
classing of objects to implement the equation was replaced with calls to solve, and 



adj_inc_timestep was added at the end of the timeloop as described in section 5.1 
These modifications were trivial, and involved changing less than ten lines of code. 
The functional chosen was the Willmore functional evaluated integrated over time 



MX/)./,!/)) = f /" ' / (eV 2 c(t)-±P\ dxdt 

(6.4) 

-/i(t) I dxdt, 



4e J t =o Jn\ €dc 

1 /•' ' r ( i ^ 



4e 



t=o Jn 



where T denotes the final time of the simulation. This functional is physically relevant, 
as it is intimately connected to the finite-time stability of transition solutions of the 
Cahn-Hilliard equation [4 . The problem was solved on a mesh of £1 = [0,1] 2 with 
501,264 vertices (> 1 million degrees of freedom), and ran for 50 timesteps with a 
timestep At = 5 x 10~ 6 . The solution was computed on 24 cores using DOLFIN's 
MPI support; both the forward and adjoint models ran in parallel with no further 
modification. The solution of the concentration field at the initial and final time are 



shown in figure 6.1 



To test the checkpointing implementation, libadjoint was configured to use the 
multistage checkpointing algorithm of revolve |45j . with 5 checkpoints available in 
memory and 10 on disk. As described in section |5.1[ the use of this checkpointing 
algorithm is entirely transparent to the DOLFIN user. 

To verify the correctness of the adjoint solution, the Taylor remainder convergence 
test was applied. Let W(cq) be the Willmore functional considered as a pure func- 



tion of the initial condition; i.e., to evaluate W(cq), solve the PDE (6.1) with initial 



condition cq to compute n(t), and then evaluate W as in (6.4). The Taylor remainder 
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convergence test is based on the observation that, given an arbitrary perturbation c 
to the initial conditions Co, 



W{c + he) -W(c ) — >0 aXO(\h\) 



(6.5) 



but that 



W(c + he) - W(c ) - h~c L VW — >0 at 0(\hf), 



(6.6) 



where the gradient \7W is computed using the adjoint solution z 



VVK = - 



z, 



dF 

den 



(6.7) 



where F = is the discrete system corresponding to (6.1 ) 18 . This test is extremely 
sensitive to even slight errors in the implementation of the adjoint, and rigorously 
checks that the computed gradient is consistent with the discrete forward model. The 
perturbation c was pseudorandomly generated, with each value uniformly distributed 
in [0,1]. 

The results of the Taylor remainder convergence test can be seen in table |6.1| 



As expected, the convergence orders (6.5| and (6.6 1 hold, indicating that the adjoint 



solution computed is correct. Similarly the tangent linear model was verified, with 
the functional evaluated at the final time rather than integrated. The functional 
gradient in the direction of the perturbation c is computed using the tangent linear 
model 



„ T „— /8W dc T A 
\dc T dc I 



(6.8) 



where is the Jacobian matrix of the final solution with respect to the initial con- 
dition. The tangent linear model computes the term c. For the tangent linear 
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W(c ) - W(cq) 



rder 



W(c ) - W(co) 



c T VW 



rder 



1 x 10" 7 


3.4826 x 10" 


6 




3.0017 x 10 


-9 




5 x 1(T 8 


1.7405 x 10~ 


6 


1.0006 


7.4976 x 10- 


10 


2.0013 


2.5 x 10~ 8 


8.7009 x 10" 


7 


1.0003 


1.8737 x 10- 


10 


2.0005 


1.25 x 1CT 8 


4.3499 x 10~ 


7 


1.0002 


4.6829 x 10- 


11 


2.0004 


6.25 x 10~ 9 


2.1749 x 10" 


7 


1.0001 


1.1716 x 10- 


11 


1.9989 



Table 6.1: The Taylor remainders for the Willmore functional W evaluated at a 
perturbed initial condition cq = cq + he, where the perturbation direction c is pseu- 
dorandomly generated. As expected, the Taylor remainder incorporating gradient 
information computed using the adjoint converges at second order, indicating that 
the functional gradient computed using the adjoint is correct. 



W(c ) - W(co) 



rder 



W(c ) - W(co) 



Co 



vw 



rder 



1 x 10" 
5 x 10" 
2.5 x 10" 
1.25 x 10 
6.25 x 10 



0.76441 
0.39007 
0.19698 
0.09897 
0.04960 



0.9705 
0.9857 
0.9929 
0.9965 



0.03120 
0.00773 
0.00192 
4.8005 x 10" 
1.1987 x 10- 



2.012 
2.006 
2.003 
2.001 



Table 6.2: The Taylor remainders for the Willmore functional W evaluated at a 
perturbed initial condition cq = cq + he, where the perturbation direction c is pseu- 
dorandomly generated. As expected, the Taylor remainder incorporating gradient 
information computed using the tangent linear model converges at second order, in- 
dicating that the functional gradient computed using the tangent linear model is 
correct. 



verification, the model was again run on 24 processors. The results of the Taylor 
remainder convergence test can be seen in table |6.2| As expected, the theoretical con- 
vergence orders hold, indicating that the tangent linear solution computed is correct. 

The efficiency of the adjoint implementation was benchmarked using a lower- 
resolution mesh as follows. First, the unannotated model was run. Then, the forward 
model was run again, with annotation, to quantify the cost of annotating the forward 
model. Finally, the forward and adjoint models were run together. During the forward 
run, all variables were stored, and checkpointing was not used during the adjoint run, 
to isolate the intrinsic cost of assembling the adjoint system. For each measurement, 
5 runs were performed on a single processor, and the minimum time taken for the 
computation was recorded. 

For this configuration, the Newton solver typically employs five linear solves. As 
the adjoint replaces each Newton solve with one linear solve, a coarse estimate of the 
optimal performance ratio is 1.2. The numerical results can be seen in table [6~3"1 The 
overhead of the annotation is less than 1%. The adjoint model takes approximately 
1.22 times the cost of the forward model. This ratio compares very well with the the- 
oretical estimate: the adjoint implementation achieves almost optimal performance. 



THE ADJOINT OF HIGH-LEVEL FINITE ELEMENT PROGRAMS 



17 





Runtime (s) 


Ratio 


Forward model 


103.93 




Forward model + annotation 


104.24 


1.002 


Forward model + annotation + adjoint model 


127.07 


1.22 



Table 6.3: Timings for the Cahn-Hilliard adjoint. As can be seen, the overhead of 
annotation is less than 1%, and the adjoint model takes approximately 22% of the 
cost of the forward model. The experiment was performed for several different mesh 
sizes, with similar results obtained in each case. 



6.2. Stokes. As described in section [573} the approach presented in this paper is 
capable of deriving the adjoint for models that use matrix-free solvers. To demonstrate 
this capability, a matrix-free variant of the mantle convection model presented in [50] 
was adjoined: 

-V-o-- Vp= (RaT)e, 
V • u = 0, 

dT 

— +u- VT-V 2 T = 0, (6.9) 

where a is the deviatoric stress tensor, p is the pressure, Ra is the thermal Rayleigh 
number, T is the temperature, u is the velocity, and e is a unit vector in the direction 
of gravity (here the — xi direction). For the configuration reported below, the Rayleigh 
number Ra was set to 10 6 , which yields a vigorously convective system. For the Stokes 
equations, no slip conditions are applied on the top and bottom boundaries, and a 
stress-free condition is applied on the remainder of the boundary. For the temperature 
equation, Dirichlct conditions are applied on the top and bottom boundaries, while 
homogeneous Neumann conditions are applied on the left and right boundaries. The 
initial temperature field is set based on an analytical expression derived from boundary 
layer theory; for full details, see |50j . 

The Stokes equations were discretised using the P2 x Pi Taylor-Hood element, 
while the advection equation for the temperature field was discretised using the Pidg 
element. The solution of the Stokes equations was achieved in a matrix-free manner 
using DOLFIN's interfaces to the PETSc matrix-free solvers [2]. Both the forward 
and adjoint problems were parallelised using DOLFIN's OpenMP support, and were 
run on 8 cores. 

The functional taken was the Nusselt number of the temperature evaluated at the 
final time 

Nu(T) = / |^ dx 2 I [ T dx 2 , (6.10) 



r t o P 



dx 2 



which measures the efficiency of the convection by comparing the total heat transferred 
to that transferred by thermal conduction alone. The adjoint was computed using 
30 checkpoints on disk and 30 checkpoints in memory. The results are illustrated in 
figure RO 
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(c) 

Fig. 6.2: (a) The initial condition for temperature of the mantle convection simulation, 
(b) The temperature field after 200 timesteps. Plumes are clearly visible, (c) The 
gradient of the Nusselt number with respect to the temperature initial condition, 
computed using the adjoint. The scale on (c) is not the same as the scale on (a)-(b). 
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h 


\R\ 


order 


\R - hf T VNu| 


order 


7.5 x 1CT 3 


0.14568 




3.6260 x 10" 2 




3.75 x 1CT 3 


0.05905 


1.302 


4.3465 x 10~ 3 


3.06 


1.875 x 1(T 3 


0.02787 


1.083 


5.1659 x 10~ 4 


3.07 


9.375 x 1(T 4 


0.01373 


1.021 


5.2311 x 10~ 5 


3.30 



Table 6.4: The Taylor remainder R = Nu(T + f T) - Nu(T - |f) for the Nus- 

selt functional Nu evaluated at a perturbed initial condition, where the perturbation 
direction T is the vector of all ones. As expected, the higher-order Taylor remain- 
der incorporating gradient information converges at third order, indicating that the 
functional gradient computed using the adjoint is correct. 



The adjoint was verified using a high er-order analogue of the Taylor remainder 



convergence test described in section 6.1 Let Nu be the Nusselt number considered 
as a pure function of the initial condition for temperature; i.e., to evaluate Nu(Xb), 
solve the PDE (6.9 1 with initial condition Tq to compute the final temperature T, and 



then evaluate Nu(T) as in (6.10|. The test is based on the observation that, given an 
arbitrary perturbation T to the initial conditions To, 



Nu(T 



h ~, 
-T) 
2 ' 



Nu(T 



h ~ s 
-T 
2 ; 



but that 



Nu(T + -f ) - Nu(T 



h ~ x 
2 T ) 



/if T VNu 



at 0(\h\), 



at 0(\h\ c 



where the gradient VNu is computed using the adjoint solution z 

dF 



VNu 



z, 



dT 



(6.11) 



(6.12) 



(6.13) 



where F = is the discrete system corresponding to (6.9). The higher-order version 



of the Taylor remainder convergence test was used because it reaches the asymptotic 
region more rapidly, which is useful for strongly nonlinear problems such as this. 
The results of the Taylor remainder convergence test can be seen in table |6.4| As 



expected, the convergence orders (6.11) and (6.12) hold, indicating that the adjoint 
solution computed is correct. 

The efficiency of the adjoint implementation was benchmarked using the same 
procedure as described in section |6.1[ again using a lower-resolution configuration. 
The results can be seen in table |6.5| The overhead of the annotation is extremely 
small, within the distribution of timings of the unannotated run. The adjoint model 
takes approximately 86% of the cost of the forward model. In this case, the forward 
model performs two Picard iterations per timestep, each of which induce a corre- 
sponding linear solve in the adjoint equations. Therefore, a naive estimate of the 
ideal theoretical efficiency is 2. On investigation, the proximate cause of the ad- 
joint run being cheaper than the forward run was that the matrix-free linear solvers 
happened to converge more quickly during the adjoint run. 
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Runtime (s) 


Ratio 


Forward model 


96.03 




Forward model + annotation 


96.05 


1.0 


Forward model + annotation + adjoint model 


178.75 


1.86 



Table 6.5: Timings for the Stokes mantle convection adjoint. As can be seen, the 
overhead of annotation is extremely small, and the adjoint model takes approximately 
86% of the cost of the forward model. The experiment was performed for several 
different mesh sizes, with similar results obtained in each case. 



6.3. Viscoelasticity. Most biological tissue responds in a viscoelastic, rather 
than purely elastic, manner. As a final example, we consider a nontrivial discretisation 
of a viscoelastic model for the deformation and stress development in the upper part 
of the spinal cord under pressure induced by the pulsating flow of cerebrospinal fluid 

M- 

The Standard Linear Solid viscoelastic model equations can be phrased as: 
find the Maxwell stress tensor ao, the elastic stress tensor <n, the velocity v and the 
vorticity 7 such that 

d 

A\ — do + A° a <7 - Vw + 7 = 0, 
d 

Al-^ai-Vv + j = 0, ( 6 .i4) 
V- (<7 + 0-1) =0, 

skw((To + <7l) =0, 

for (t; x, y, z) € (0, T] x fi. Here, A\, A^, A\ are fourth-order compliance tensors, 
the divergence and gradient are taken row-wise, and skw denotes the skew-symmetric 
component of a tensor field. The total stress tensor a is the sum of the Maxwell and 
elastic contributions. In the isotropic case, each of the compliance tensors A®, A®, A\ 
reduce to two-parameter maps: 

a j = i }) 1 > C j £ = t¥ + A}tr(e)I, ( 6 - 15 ) 

where /ij, A* are positive Lame parameters. The system is closed by initial conditions 
for the Maxwell stress, essential boundary conditions for the velocity, and traction 
boundary conditions for the total stress a ■ n, where n is the outward normal on the 
domain boundary. The cord was kept fixed at the top and bottom, and the parameters 
were set to ^0 = 37.466, A„ = 10 4 , (i\ = 4.158, A° = 10 3 , fi\ = 2.39, A} = 10 3 (kPa). 
The traction boundary condition was set to 

a ■ h = ~pn, 

where p is a periodically varying pressure modelled as p(t] x, y, z) = asin(27r<)(171 — 
78)~ 1 (z — 78) (kPa), where a = 0.05 is the amplitude of the pressure. 



The discretisation of (6.141 is performed using the (locking-free) scheme intro- 
duced in [43] . allowing for direct approximation of the stresses, while enforcing the 
symmetry of the total stress weakly. The temporal discretisation is carried out via 



THE ADJOINT OF HIGH-LEVEL FINITE ELEMENT PROGRAMS 



21 



Fig. 6.3: The magnitude of the Maxwell stress in the horizontal plane at t = 1.25. 



a two-step TR — BDF 2 scheme; that is, a Crank-Nicolson step followed by a 2-step 
backward difference scheme, while the spatial, mixed finite element discretisation is 
based on seeking approximations oo(t), 0i(t), v(t),j(t) in the space 

Z = BDMj x BDMj x P 3 0DG x P 3 0DG . (6.16) 

where BDM X denotes the lowest-order 7J(div)-conforming Brezzi-Douglas-Marini el- 
ements and Podg denotes piecewise constants. See [43] for more details. 

Abnormal stress conditions in the interior of the spinal cord may be of biomedical 
interest [571 HI]- I n particular, we focus on the contribution of the Maxwell stress 
tensor in the horizontal plane at the final time T: 

J= [ (a (T) ■ e z f Ax, (6.17) 

where e z denotes a unit vector in the z-direction. 

The problem was solved on a tetrahedral mesh generated from patient-specific 
imaging data, yielding a total of 879 204 degrees of freedom, for t € [0, 1.25] with a 
time step of At — 0.01. As the system is linear, the matrices corresponding to each 
of the steps in the TR — BDF2 scheme were prc-asscmbled outside of the timeloop 
and their LU factorisations were cached. This optimisation is recognised by dolfin- 
adjoint, which then applies the analogous factorisation strategy to the corresponding 
adjoint solve. The Maxwell stress in the horizontal plane at t = 1.25 is illustrated in 
figure |6.3| 

To verify the correctness of the adjoint solution, the Taylor remainder convergence 
test was applied. Let J(a) be the Maxwell stress functional considered as a pure 
function of the amplitude of the applied pressure; i.e., to evaluate J (a), solve the 



PDE (6.14) with pressure amplitude a to compute <jq(T), and then evaluate J as 



in (6.17). The results of the Taylor remainder convergence test can be seen in table 6.6 



As expected, the convergence orders (6.5) and (6.6) hold, indicating that the adjoint 
solution computed is correct. 
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Sa 


J (a + Sa) - J(a) 


order 


J(o + Sa) — J(a) — 


VJ- Sa 


order 


0.05 


9.1012 x KT 3 




3.0337 x 10- 


3 




0.025 


3.7921 x 10" 3 


1.2630 


7.58417 x 10 


-4 


2.0000 


0.0125 


1.7064 x 10~ 3 


1.1520 


1.8959 x 1Q- 


-4 


2.0000 


6.25 x 10~ 3 


8.0583 x 10" 4 


1.0824 


4.7397 x 10" 


-5 


2.0001 


3.125 x 10~ 3 


3.9106 x 10" 4 


1.0430 


1.1848 x 10" 


-5 


2.0001 



Table 6.6: The Taylor remainders for the functional given by (6.17). As expected, 
the Taylor remainder incorporating gradient information computed using the adjoint 
converges at second order, indicating that the functional gradient computed using the 
adjoint is correct. 



The efficiency of the adjoint implementation was benchmarked using the same 
procedure as described in section |6.1[ again using a lower-resolution configuration. 
The results can be seen in table |6.7| As the problem is linear, the theoretical estimate 
of the ideal efficiency ratio is 2, which is approximately achieved by the implementa- 
tion presented here. 

7. Conclusion. Naumann's recent book on algorithmic differentiation states (37J 
pg. xii]: 

[T]hc automatic generation of optimal (in terms of robustness and 
efficiency) adjoint versions of large-scale simulation code is one of 
the great open challenges in the field of High-Performance Scientific 
Computing. 

The framework presented here, dolfin-adjoint, provides a robust and efficient mecha- 
nism for automatically deriving adjoint and tangent linear models of a wide variety 
of finite element models implemented in the Python interface to the DOLFIN library. 
Only minimal changes are required to adapt such a forward model for use with dolfin- 
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Runtime (s) 


Ratio 


Forward model 


119.93 




Forward model + annotation 


120.24 


1.002 


Forward model + annotation + adjoint model 


243.99 


2.029 



Table 6.7: Timings for the viscoelasticity adjoint. As can be seen, the overhead of 
annotation is less than 1%, and the adjoint model takes almost exactly the same 
length of time as the forward model, in accordance with the expectation from theory. 
The experiment was performed for several different mesh sizes, with similar results 
obtained in each case. 



adjoint. The adjoint model draws on the advantages of libadjoint to deliver optimal 
checkpointing strategies, and inherits the seamless parallelism of the FEniCS frame- 
work. The numerical results obtained demonstrate optimal efficiency in the adjoint 
model. 

The approach employed in dolfin-adjoint is analogous but fundamentally differ- 
ent to that adopted by algorithmic differentiation: by operating at a higher level of 
abstraction, a much greater degree of automation and efficiency has been achieved. 
The framework presented here enables adjoint models to be derived automatically, 
reliably and robustly, relieving the model developers of the adjoint development task. 
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